20 research outputs found
ERIC: An Efficient and Practical Software Obfuscation Framework
Modern cloud computing systems distribute software executables over a network
to keep the software sources, which are typically compiled in a
security-critical cluster, secret. We develop ERIC, a new, efficient, and
general software obfuscation framework. ERIC protects software against (i)
static analysis, by making only an encrypted version of software executables
available to the human eye, no matter how the software is distributed, and (ii)
dynamic analysis, by guaranteeing that an encrypted executable can only be
correctly decrypted and executed by a single authenticated device. ERIC
comprises key hardware and software components to provide efficient software
obfuscation support: (i) a hardware decryption engine (HDE) enables efficient
decryption of encrypted hardware in the target device, (ii) the compiler can
seamlessly encrypt software executables given only a unique device identifier.
Both the hardware and software components are ISA-independent, making ERIC
general. The key idea of ERIC is to use physical unclonable functions (PUFs),
unique device identifiers, as secret keys in encrypting software executables.
Malicious parties that cannot access the PUF in the target device cannot
perform static or dynamic analyses on the encrypted binary. We develop ERIC's
prototype on an FPGA to evaluate it end-to-end. Our prototype extends RISC-V
Rocket Chip with the hardware decryption engine (HDE) to minimize the overheads
of software decryption. We augment the custom LLVM-based compiler to enable
partial/full encryption of RISC-V executables. The HDE incurs minor FPGA
resource overheads, it requires 2.63% more LUTs and 3.83% more flip-flops
compared to the Rocket Chip baseline. LLVM-based software encryption increases
compile time by 15.22% and the executable size by 1.59%. ERIC is publicly
available and can be downloaded from https://github.com/kasirgalabs/ERICComment: DSN 2022 - The 52nd Annual IEEE/IFIP International Conference on
Dependable Systems and Network
A Case for Self-Managing DRAM Chips: Improving Performance, Efficiency, Reliability, and Security via Autonomous in-DRAM Maintenance Operations
The memory controller is in charge of managing DRAM maintenance operations
(e.g., refresh, RowHammer protection, memory scrubbing) in current DRAM chips.
Implementing new maintenance operations often necessitates modifications in the
DRAM interface, memory controller, and potentially other system components.
Such modifications are only possible with a new DRAM standard, which takes a
long time to develop, leading to slow progress in DRAM systems.
In this paper, our goal is to 1) ease, and thus accelerate, the process of
enabling new DRAM maintenance operations and 2) enable more efficient in-DRAM
maintenance operations. Our idea is to set the memory controller free from
managing DRAM maintenance. To this end, we propose Self-Managing DRAM (SMD), a
new low-cost DRAM architecture that enables implementing new in-DRAM
maintenance mechanisms (or modifying old ones) with no further changes in the
DRAM interface, memory controller, or other system components. We use SMD to
implement new in-DRAM maintenance mechanisms for three use cases: 1) periodic
refresh, 2) RowHammer protection, and 3) memory scrubbing. We show that SMD
enables easy adoption of efficient maintenance mechanisms that significantly
improve the system performance and energy efficiency while providing higher
reliability compared to conventional DDR4 DRAM. A combination of SMD-based
maintenance mechanisms that perform refresh, RowHammer protection, and memory
scrubbing achieve 7.6% speedup and consume 5.2% less DRAM energy on average
across 20 memory-intensive four-core workloads. We make SMD source code openly
and freely available at [128]
Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction
Long-latency load requests continue to limit the performance of
high-performance processors. To increase the latency tolerance of a processor,
architects have primarily relied on two key techniques: sophisticated data
prefetchers and large on-chip caches. In this work, we show that: 1) even a
sophisticated state-of-the-art prefetcher can only predict half of the off-chip
load requests on average across a wide range of workloads, and 2) due to the
increasing size and complexity of on-chip caches, a large fraction of the
latency of an off-chip load request is spent accessing the on-chip cache
hierarchy. The goal of this work is to accelerate off-chip load requests by
removing the on-chip cache access latency from their critical path. To this
end, we propose a new technique called Hermes, whose key idea is to: 1)
accurately predict which load requests might go off-chip, and 2) speculatively
fetch the data required by the predicted off-chip loads directly from the main
memory, while also concurrently accessing the cache hierarchy for such loads.
To enable Hermes, we develop a new lightweight, perceptron-based off-chip load
prediction technique that learns to identify off-chip load requests using
multiple program features (e.g., sequence of program counters). For every load
request, the predictor observes a set of program features to predict whether or
not the load would go off-chip. If the load is predicted to go off-chip, Hermes
issues a speculative request directly to the memory controller once the load's
physical address is generated. If the prediction is correct, the load
eventually misses the cache hierarchy and waits for the ongoing speculative
request to finish, thus hiding the on-chip cache hierarchy access latency from
the critical path of the off-chip load. Our evaluation shows that Hermes
significantly improves performance of a state-of-the-art baseline. We
open-source Hermes.Comment: To appear in 55th IEEE/ACM International Symposium on
Microarchitecture (MICRO), 202
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM
Processing-using-memory (PuM) techniques leverage the analog operation of
memory cells to perform computation. Several recent works have demonstrated PuM
techniques in off-the-shelf DRAM devices. Since DRAM is the dominant memory
technology as main memory in current computing systems, these PuM techniques
represent an opportunity for alleviating the data movement bottleneck at very
low cost. However, system integration of PuM techniques imposes non-trivial
challenges that are yet to be solved. Design space exploration of potential
solutions to the PuM integration challenges requires appropriate tools to
develop necessary hardware and software components. Unfortunately, current
specialized DRAM-testing platforms, or system simulators do not provide the
flexibility and/or the holistic system view that is necessary to deal with PuM
integration challenges.
We design and develop PiDRAM, the first flexible end-to-end framework that
enables system integration studies and evaluation of real PuM techniques.
PiDRAM provides software and hardware components to rapidly integrate PuM
techniques across the whole system software and hardware stack (e.g., necessary
modifications in the operating system, memory controller). We implement PiDRAM
on an FPGA-based platform along with an open-source RISC-V system. Using
PiDRAM, we implement and evaluate two state-of-the-art PuM techniques: in-DRAM
(i) copy and initialization, (ii) true random number generation. Our results
show that the in-memory copy and initialization techniques can improve the
performance of bulk copy operations by 12.6x and bulk initialization operations
by 14.6x on a real system. Implementing the true random number generator
requires only 190 lines of Verilog and 74 lines of C code using PiDRAM's
software and hardware components.Comment: To appear in ACM Transactions on Architecture and Code Optimizatio
Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator
We present Ramulator 2.0, a highly modular and extensible DRAM simulator that
enables rapid and agile implementation and evaluation of design changes in the
memory controller and DRAM to meet the increasing research effort in improving
the performance, security, and reliability of memory systems. Ramulator 2.0
abstracts and models key components in a DRAM-based memory system and their
interactions into shared interfaces and independent implementations. Doing so
enables easy modification and extension of the modeled functions of the memory
controller and DRAM in Ramulator 2.0. The DRAM specification syntax of
Ramulator 2.0 is concise and human-readable, facilitating easy modifications
and extensions. Ramulator 2.0 implements a library of reusable templated lambda
functions to model the functionalities of DRAM commands to simplify the
implementation of new DRAM standards, including DDR5, LPDDR5, HBM3, and GDDR6.
We showcase Ramulator 2.0's modularity and extensibility by implementing and
evaluating a wide variety of RowHammer mitigation techniques that require
different memory controller design changes. These techniques are added
modularly as separate implementations without changing any code in the baseline
memory controller implementation. Ramulator 2.0 is rigorously validated and
maintains a fast simulation speed compared to existing cycle-accurate DRAM
simulators. Ramulator 2.0 is open-sourced under the permissive MIT license at
https://github.com/CMU-SAFARI/ramulator
DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips
To understand and improve DRAM performance, reliability, security and energy
efficiency, prior works study characteristics of commodity DRAM chips.
Unfortunately, state-of-the-art open source infrastructures capable of
conducting such studies are obsolete, poorly supported, or difficult to use, or
their inflexibility limit the types of studies they can conduct.
We propose DRAM Bender, a new FPGA-based infrastructure that enables
experimental studies on state-of-the-art DRAM chips. DRAM Bender offers three
key features at the same time. First, DRAM Bender enables directly interfacing
with a DRAM chip through its low-level interface. This allows users to issue
DRAM commands in arbitrary order and with finer-grained time intervals compared
to other open source infrastructures. Second, DRAM Bender exposes easy-to-use
C++ and Python programming interfaces, allowing users to quickly and easily
develop different types of DRAM experiments. Third, DRAM Bender is easily
extensible. The modular design of DRAM Bender allows extending it to (i)
support existing and emerging DRAM interfaces, and (ii) run on new commercial
or custom FPGA boards with little effort.
To demonstrate that DRAM Bender is a versatile infrastructure, we conduct
three case studies, two of which lead to new observations about the DRAM
RowHammer vulnerability. In particular, we show that data patterns supported by
DRAM Bender uncovers a larger set of bit-flips on a victim row compared to the
data patterns commonly used by prior work. We demonstrate the extensibility of
DRAM Bender by implementing it on five different FPGAs with DDR4 and DDR3
support. DRAM Bender is freely and openly available at
https://github.com/CMU-SAFARI/DRAM-Bender.Comment: To appear in TCAD 202
SpyHammer: Using RowHammer to Remotely Spy on Temperature
RowHammer is a DRAM vulnerability that can cause bit errors in a victim DRAM
row by just accessing its neighboring DRAM rows at a high-enough rate. Recent
studies demonstrate that new DRAM devices are becoming increasingly more
vulnerable to RowHammer, and many works demonstrate system-level attacks for
privilege escalation or information leakage. In this work, we leverage two key
observations about RowHammer characteristics to spy on DRAM temperature: 1)
RowHammer-induced bit error rate consistently increases (or decreases) as the
temperature increases, and 2) some DRAM cells that are vulnerable to RowHammer
cause bit errors only at a particular temperature. Based on these observations,
we propose a new RowHammer attack, called SpyHammer, that spies on the
temperature of critical systems such as industrial production lines, vehicles,
and medical systems. SpyHammer is the first practical attack that can spy on
DRAM temperature. SpyHammer can spy on absolute temperature with an error of
less than 2.5 {\deg}C at the 90th percentile of tested temperature points, for
12 real DRAM modules from 4 main manufacturers
RowPress: Amplifying Read Disturbance in Modern DRAM Chips
Memory isolation is critical for system reliability, security, and safety.
Unfortunately, read disturbance can break memory isolation in modern DRAM
chips. For example, RowHammer is a well-studied read-disturb phenomenon where
repeatedly opening and closing (i.e., hammering) a DRAM row many times causes
bitflips in physically nearby rows.
This paper experimentally demonstrates and analyzes another widespread
read-disturb phenomenon, RowPress, in real DDR4 DRAM chips. RowPress breaks
memory isolation by keeping a DRAM row open for a long period of time, which
disturbs physically nearby rows enough to cause bitflips. We show that RowPress
amplifies DRAM's vulnerability to read-disturb attacks by significantly
reducing the number of row activations needed to induce a bitflip by one to two
orders of magnitude under realistic conditions. In extreme cases, RowPress
induces bitflips in a DRAM row when an adjacent row is activated only once. Our
detailed characterization of 164 real DDR4 DRAM chips shows that RowPress 1)
affects chips from all three major DRAM manufacturers, 2) gets worse as DRAM
technology scales down to smaller node sizes, and 3) affects a different set of
DRAM cells from RowHammer and behaves differently from RowHammer as temperature
and access pattern changes.
We demonstrate in a real DDR4-based system with RowHammer protection that 1)
a user-level program induces bitflips by leveraging RowPress while conventional
RowHammer cannot do so, and 2) a memory controller that adaptively keeps the
DRAM row open for a longer period of time based on access pattern can
facilitate RowPress-based attacks. To prevent bitflips due to RowPress, we
describe and evaluate a new methodology that adapts existing RowHammer
mitigation techniques to also mitigate RowPress with low additional performance
overhead. We open source all our code and data to facilitate future research on
RowPress.Comment: Extended version of the paper "RowPress: Amplifying Read Disturbance
in Modern DRAM Chips" at the 50th Annual International Symposium on Computer
Architecture (ISCA), 202
TuRaN: True Random Number Generation Using Supply Voltage Underscaling in SRAMs
Prior works propose SRAM-based TRNGs that extract entropy from SRAM arrays.
SRAM arrays are widely used in a majority of specialized or general-purpose
chips that perform the computation to store data inside the chip. Thus,
SRAM-based TRNGs present a low-cost alternative to dedicated hardware TRNGs.
However, existing SRAM-based TRNGs suffer from 1) low TRNG throughput, 2) high
energy consumption, 3) high TRNG latency, and 4) the inability to generate true
random numbers continuously, which limits the application space of SRAM-based
TRNGs. Our goal in this paper is to design an SRAM-based TRNG that overcomes
these four key limitations and thus, extends the application space of
SRAM-based TRNGs. To this end, we propose TuRaN, a new high-throughput,
energy-efficient, and low-latency SRAM-based TRNG that can sustain continuous
operation. TuRaN leverages the key observation that accessing SRAM cells
results in random access failures when the supply voltage is reduced below the
manufacturer-recommended supply voltage. TuRaN generates random numbers at high
throughput by repeatedly accessing SRAM cells with reduced supply voltage and
post-processing the resulting random faults using the SHA-256 hash function. To
demonstrate the feasibility of TuRaN, we conduct SPICE simulations on different
process nodes and analyze the potential of access failure for use as an entropy
source. We verify and support our simulation results by conducting real-world
experiments on two commercial off-the-shelf FPGA boards. We evaluate the
quality of the random numbers generated by TuRaN using the widely-adopted NIST
standard randomness tests and observe that TuRaN passes all tests. TuRaN
generates true random numbers with (i) an average (maximum) throughput of
1.6Gbps (1.812Gbps), (ii) 0.11nJ/bit energy consumption, and (iii) 278.46us
latency
An Experimental Analysis of RowHammer in HBM2 DRAM Chips
RowHammer (RH) is a significant and worsening security, safety, and
reliability issue of modern DRAM chips that can be exploited to break memory
isolation. Therefore, it is important to understand real DRAM chips' RH
characteristics. Unfortunately, no prior work extensively studies the RH
vulnerability of modern 3D-stacked high-bandwidth memory (HBM) chips, which are
commonly used in modern GPUs.
In this work, we experimentally characterize the RH vulnerability of a real
HBM2 DRAM chip. We show that 1) different 3D-stacked channels of HBM2 memory
exhibit significantly different levels of RH vulnerability (up to 79%
difference in bit error rate), 2) the DRAM rows at the end of a DRAM bank (rows
with the highest addresses) exhibit significantly fewer RH bitflips than other
rows, and 3) a modern HBM2 DRAM chip implements undisclosed RH defenses that
are triggered by periodic refresh operations. We describe the implications of
our observations on future RH attacks and defenses and discuss future work for
understanding RH in 3D-stacked memories.Comment: To appear at DSN Disrupt 202